refactor(span)!: use VecMap for meta, metrics and meta_struct for v04 spans#2043
refactor(span)!: use VecMap for meta, metrics and meta_struct for v04 spans#2043yannham wants to merge 12 commits into
meta, metrics and meta_struct for v04 spans#2043Conversation
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: e93e908 | Docs | Datadog PR Page | Give us feedback! |
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
📚 Documentation Check Results📦
|
🔒 Cargo Deny Results📦
|
meta, metrics and meta_struct
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## yannham/vecmap-as-dedup #2043 +/- ##
===========================================================
- Coverage 72.92% 72.91% -0.01%
===========================================================
Files 460 460
Lines 76481 76558 +77
===========================================================
+ Hits 55774 55824 +50
- Misses 20707 20734 +27
🚀 New features to boost your workflow:
|
1c727ec to
a9e644d
Compare
| size += k.as_ref().len() + v.as_ref().len(); | ||
| } | ||
| for k in self.metrics.keys() { | ||
| for (k, _) in &self.metrics { |
There was a problem hiding this comment.
TODO: this is wrong, because keys can be duplicated.
Solution 1: don't dedup. The size will be over-estimated.
Solution 2: deduplicate here. Costly (2 times with serialization).
Solution 3: deduplicate upfront (and eg keep a dirty flag in the vecmap to avoid double dedup).
There was a problem hiding this comment.
Now that we have a dirty flag maybe this should be changed to solution 3?
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
681d36f to
95d3d03
Compare
meta, metrics and meta_structmeta, metrics and meta_struct
meta, metrics and meta_structmeta, metrics and meta_struct for v04 spans
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 95d3d03685
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
74358f8 to
7e38b23
Compare
2d45a7f to
8f43f82
Compare
d3b289b to
3dc15ea
Compare
The trace_buffer benchmark was still using HashMap for Span meta, metrics, and meta_struct fields after the VecMap migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The decoder functions were returning Vec<(K, V)> which callers immediately converted to VecMap via .into(). Return VecMap directly to avoid the unnecessary intermediate type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3dc15ea to
57b3ec4
Compare
| /// from a static str and check if the string is empty. | ||
| pub trait SpanText: Debug + Eq + Hash + Borrow<str> + Serialize + Default + From<String> { | ||
| pub trait SpanText: | ||
| Debug + Eq + Hash + Clone + Borrow<str> + Serialize + Default + From<String> |
There was a problem hiding this comment.
I think it's ok to add Clone, since it can always be derived as From<String> + Borrow<str> (although in a potentially unoptimal way) ?
There was a problem hiding this comment.
Why do you need it? SpanText used to be Clone until I removed it for the python span text (because cloning without allocation requires holding the GIL)
I think we don't want people to use this API in general, even if it's derivable as it makes it harder to make performance issues
There was a problem hiding this comment.
Ok I think I found why...
VecMap::dedup does self.data.retain(|(k, _)| seen.insert(k.clone())); reuiring the Clone trait bound on keys, which I don't think is actually required. You could just insert a ref (I think?)
| const PARTIAL_VERSION_KEY: &str = "_dd.partial_version"; | ||
|
|
||
| fn set_top_level_span<T>(span: &mut Span<T>, is_top_level: bool) | ||
| fn set_top_level_span<T>(span: &mut Span<T>) |
There was a problem hiding this comment.
unrelated change, but is_top_level is always used with false, so... it does get rid of a remove call.
ce67621 to
7328f51
Compare
| size += k.as_ref().len() + v.as_ref().len(); | ||
| } | ||
| for k in self.metrics.keys() { | ||
| for (k, _) in &self.metrics { |
There was a problem hiding this comment.
Now that we have a dirty flag maybe this should be changed to solution 3?
| /// from a static str and check if the string is empty. | ||
| pub trait SpanText: Debug + Eq + Hash + Borrow<str> + Serialize + Default + From<String> { | ||
| pub trait SpanText: | ||
| Debug + Eq + Hash + Clone + Borrow<str> + Serialize + Default + From<String> |
There was a problem hiding this comment.
Why do you need it? SpanText used to be Clone until I removed it for the python span text (because cloning without allocation requires holding the GIL)
I think we don't want people to use this API in general, even if it's derivable as it makes it harder to make performance issues
| /// from a static str and check if the string is empty. | ||
| pub trait SpanText: Debug + Eq + Hash + Borrow<str> + Serialize + Default + From<String> { | ||
| pub trait SpanText: | ||
| Debug + Eq + Hash + Clone + Borrow<str> + Serialize + Default + From<String> |
There was a problem hiding this comment.
Ok I think I found why...
VecMap::dedup does self.data.retain(|(k, _)| seen.insert(k.clone())); reuiring the Clone trait bound on keys, which I don't think is actually required. You could just insert a ref (I think?)
| /// Null values are skipped (key not inserted into map). | ||
| /// Null values are skipped (key not inserted into vec). | ||
| #[inline] | ||
| pub fn read_str_map_to_strings<T: DeserializableTraceData>( |
There was a problem hiding this comment.
Maybe this should be renamed to read_str_map_to_vecmap , because the name is confusing when compared to read_str_map_to_hashmap
Depends on #2022 and #2049.
What does this PR do?
This PR continues the native span performance work required to land native spans in dd-trace-js. Follow-up of #2022. Actually use
VecMapin theSpandata structure.Motivation
Performance improvement following dd-trace-js native span experiments. See #2022.
Additional considerations
There are some deduplication design choices. We deduplicate before serializing to the agent, because while the current agent implementation would support duplicate keys in a map (with the same semantics of "last one wins"), this is not part of the spec/API, so we can't rely on it. We also deduplicate defensively in the msgpack encoder, but it should already be deduped at this stage.
For byte estimate (
byte_size()), we make the choice of not deduplicating. This means potentially overestimating the size of a chunk (and reaching the limit sooner), in exchange of delaying the deduplication to serialization time.How to test the change?
Run tests.